Goto

Collaborating Authors

 bayesian logistic regression



Appendices

Neural Information Processing Systems

Additionally, to avoid gradients with infinite means even if DL is not contractive, we consider a spectral normalisation, so that instead of computing recursively ฮท0 = ฮต and ฮทk = DLฮทk 1 for k {1,...,N},weset ฮท0 =ฮตand The motivation was to have a quadratic increase for the penalty term if the largest absolute eigenvalue approaches 1, and then smoothly switch to a linear function for values larger than ฮด2. The suggested approach can perform poorly for non-convex potentials or even convex potentials such as arsing in a logistic regression model for some data sets. The idea now is to run HMC with unit mass matrix for the transformed variables z = f 1(q) where q ฯ€. Hessian-vector products can similarly be computed using vector-Jacobian products: With g(z) = grad( U,z), we then compute 2 U(z)w = vjp(g,z,w)> for z = f 1(stop grad(f(zbL/2c)). We also stop all U gradients, i.e.




Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

arXiv.org Machine Learning

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.



Appendices A Gradient terms for the adaptation scheme

Neural Information Processing Systems

A.1 Gradients for the entropy approximation Following the arguments in [13], we can compute the gradient of the term in (13) using ฮธ A.2 Gradients for the penalty function We used the following penalty function h( x) = ( x ฮด) A.3 Gradients for the energy error We can write the energy error as (q We generalise the arguments from [14], Lemma 7. Proceeding by induction over n, we have for the case n = 1, for any v R The suggested approach can perform poorly for non-convex potentials or even convex potentials such as arsing in a logistic regression model for some data sets. We illustrate here how to learn a reasonable proposal for a general potential function by considering some version of position-dependent preconditioning. The transformation f as well as U generally depend on some parameters ฮธ that we again omit for a less convoluted notation. Our approach can be seen as an alternative for instance to [31] where such a transformation is first learned by trying to approximate ฯ€ with a standard Gaussian density using variational inference, while the HMC hyperparameters are adapted in a second step using Bayesian optimisation. The motivation for stopping the gradients comes from considering the special case f: z null Cz that corresponds to the position-independent preconditioning scheme above.



Hotel Booking Cancellation Prediction Using Applied Bayesian Models

arXiv.org Artificial Intelligence

This study applies Bayesian models to predict hotel booking cancellations, a key challenge affecting resource allocation, revenue, and customer satisfaction in the hospitality industry. Using a Kaggle dataset with 36,285 observations and 17 features, Bayesian Logistic Regression and Beta-Binomial models were implemented. The logistic model, applied to 12 features and 5,000 randomly selected observations, outperformed the Beta-Binomial model in predictive accuracy. Key predictors included the number of adults, children, stay duration, lead time, car parking space, room type, and special requests. Model evaluation using Leave-One-Out Cross-Validation (LOO-CV) confirmed strong alignment between observed and predicted outcomes, demonstrating the model's robustness. Special requests and parking availability were found to be the strongest predictors of cancellation. This Bayesian approach provides a valuable tool for improving booking management and operational efficiency in the hotel industry.


Quasi-Newton Methods for Markov Chain Monte Carlo

Neural Information Processing Systems

The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasi-Newton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives.